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Abstract.  Many  current  recognition  systems  use  constrained  search  to  locate  objects 
in  cluttered  environments.  Earlier  analysis  of  one  class  of  methods  has  shown  that  the 
expected  amount  of  search  is  quadratic  in  the  number  of  model  and  data  features,  if  all 
the  data  is  known  to  come  from  a  single  object,  but  is  exponential  when  spurious  data  is 
included.  To  overcome  this,  many  methods  terminate  search  once  an  interpretation  that 
is  “good  enough”  is  found.  In  this  paper,  we  formally  examine  the  combinatorics  of  this 
approach,  showing  that  choosing  correct  termination  procedures  can  dramatically  reduce 
the  search.  In  particular,  we  provide  conditions  on  the  object  model  and  the  scene  clutter 
such  that  the  expected  search  is  polynomial.  The  analytic  results  are  shown  to  be  in 
agreement  with  empirical  data  for  cluttered  object  recognition. 
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Most  current  approaches  to  the  recognition  and  localization  of  objects  from 
noisy  sensor  data  in  cluttered  environments  utilize  a  search  process  to  find  solutions 
to  the  problem.  Typically,  this  search  finds  interpretations  of  the  data  by  identifying 
pairings  of  data  features  to  model  features  that  are  consistent  with  a  rigid  trans¬ 
formation  of  the  object  model  into  sensor  coordinates.  There  are  many  variations 
on  this  approach,  including  hypothesize  and  test  methods  [e.g.  Lowe  1985,  1987. 
Ayache  &  Faugeras  1986,  Huttenlocher  &  Ullman  1987,  Huttenlocher  1989],  maxi¬ 
mal  clique  methods  [e.g.  Bolles  &  Cain  1982]  and  constrained  tree  search  methods 
[e.g.  Grimson  &  Lozano-Perez  1984,  1987,  Gaston  &  Lozano-Perez  1984,  Murray 
1987a,  1987b,  Murray  &  Cook  1988,  Drumheder  1987].  Formal  analysis  of  the  last 
class  of  methods  [Grimson  1989]  has  shown  that  performance  is  very  different  when 
all  of  the  sensory  data  are  known  to  have  come  from  a  single  object,  as  opposed  to 
sensory  data  that  includes  spurious  features.  If  all  of  the  data  are  known  to  have 
come  from  a  single  object,  the  expected  amount  of  search  required  to  find  a  correct 
interpretation  is  on  the  order  of 

0(m 2  +  ams ) 

where  m  is  the  number  of  model  features,  s  is  the  number  of  data  features,  and  a  is 
a  small  constant.  In  most  of  the  problems  of  interest,  s  >  m  so  that  the  expected 
amount  of  search  is  quadratic  in  the  parameters  of  the  problem,  and  is  linear  in  the 
number  of  data-model  pairings.  On  the  other  hand,  if  spurious  data  is  allowed,  the 
expected  amount  of  search  is  bounded  above  by  an  expression  on  the  order  of 

0(ms2c  +  m2s2[l  +  e]c  +  bm 6  +  m[l  +  7]*), 

where  again  m  is  the  number  of  model  features,  s  is  the  number  of  sensor  features, 
of  which  c  correctly  arise  from  the  object,  and  €,7  <  1  are  small  constants,  and  it 
is  bounded  below  by  an  expression  on  the  order  of 

o(m2c  +  ms). 

Depending  on  the  specific  parameters  of  the  problem,  different  terms  of  these  ex¬ 
pressions  will  dominate,  but  in  general,  the  expected  search  is  now  exponential  in 
the  problem  size. 

This  implies  that  one  of  the  hard  parts  of  the  recognition  problem  is  in  sepa¬ 
rating  out  a  correct  subset  of  the  data  from  the  spurious  data,  where  by  correct  we 
mean  a  subset  of  the  data  that  arises  from  a  single  object.  One  means  of  attack¬ 
ing  this  problem  is  to  use  grouping  mechanisms  to  preselect  likely  subspaces  of  the 
search  space  on  which  to  focus.  This  can  be  done  in  a  data  driven  fashion  [Lowe 
1985,  1987,  Jacobs  1987,  1988].  It  can  also  be  done  in  a  model  driven  manner,  for 
example,  by  using  the  generalized  Hough  transform  [Ballard  1981].  In  [Grimson 
&  Huttenlocher  1988]  we  investigated  the  combinatorics  of  using  such  schemes.  In 
particular,  we  showed  that  while  such  methods  could  reduce  the  size  of  the  search 
space,  they  could  not,  in  general,  be  used  to  select  subspaces  for  which  all  of  the 
sensory  features  came  from  a  single  object,  without  at  the  same  time  encurring  a 
non-trivial  false  positive  rate. 
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A  second  approach  is  to  use  heuristic  criteria  to  terminate  the  search  process 
once  an  interpretation  that  is  “good  enough”  has  been  found.  In  this  paper  we 
examine  this  alternative.  The  usual  method  used  for  terminating  search  [e.g.  Ayache 
k  Faugeras  1986,  Grimson  k  Lozano-Perez  1987,  Lowe  1985,  1987]  is  to  measure 
the  “goodness”  of  the  interpretation,  by  determining  what  fraction  of  the  object  is 
accounted  for  by  the  interpretation,  and  to  terminate  the  search  when  that  measure 
exceeds  some  threshold.  Typical  measures  include  the  number  of  model  features 
included  in  the  interpretation,  or  the  amount  of  perimeter  or  surface  area  of  the 
model  included  in  the  interpretation.  In  using  such  methods,  there  are  two  questions 
of  interest.  The  first  is  establishing  first  principles  methods  for  setting  the  threshold 
for  termination.  In  [Grimson  k  Huttenlocher  1989]  we  address  this  question  on  a 
formal  basis,  showing  that  estimates  for  the  threshold  may  be  found  as  a  function 
of  the  clutter  of  the  scene  and  the  size  of  the  model,  such  that  no  false  positive 
interpretations  will  be  expected.  In  this  paper,  we  turn  to  the  second  question, 
namely,  to  what  extent  does  the  use  of  premature  termination  of  the  search  reduce 
the  expected  cost  of  the  search  process  itself. 

1.  The  constrained  search  model. 

To  determine  the  expected  cost  using  premature  termination,  we  first  establish  the 
search  framework  to  be  used  in  solving  the  recognition  problem.  We  then  review 
results  from  earlier  analysis  of  the  full  constrained  search  method,  before  deriving 
new  results  on  the  use  of  premature  termination. 

We  begin  by  reviewing  the  constrained  search  method,  used  previously  in  [Grim¬ 
son  k  Lozano-Perez  1984,  1987,  Gaston  k  Lozano-Perez  1984,  Murray  1987a,  1987b, 
Murray  k  Cook  1988,  Drumheller  1987]  as  a  basis  for  recognizing  and  locating  ob¬ 
jects.  The  approach  seeks  to  match  data  features  to  model  features  in  a  manner  that 
is  consistent  with  some  rigid  transformation  of  the  model  into  the  sensory  data.  We 
assume  that  our  models  are  represented  by  sets  of  geometric  features,  such  as  edges, 
distinctive  points,  surface  patches,  axes  of  cylinders,  etc.,  and  that  the  sensory  data 
has  been  processed  to  obtain  similar  features.  There  are  many  methods  for  finding 
matches  between  such  features,  the  approach  taken  here  is  to  explore  the  space  of 
possible  correspondences  by  searching  a  tree  of  interpretations. 

This  tree  search  can  be  defined  as  follows.  Suppose  we  order  the  data  features 
in  some  arbitrary  fashion.  We  select  the  first  data  feature,  and  hypothesize  in  turn 
that  it  is  in  correspondence  with  each  of  the  model  features.  We  represent  this  set 
of  alternatives  as  a  set  of  nodes  at  the  same  level  of  a  tree  (see  Figure  1). 

Given  each  one  of  these  hypothesized  assignments  of  data  feature  f\  to  a  model 
feature,  Fj,j  =  1, . . . ,  m,  we  turn  to  the  second  data  feature.  Again,  we  can  consider 
all  possible  assignments  of  the  second  data  feature  /_>  to  model  features,  relative  to 
each  of  the  assignments  of  the  first  data  feature.  This  is  shown  in  Figure  2.  Note 
that  the  entire  set  of  nodes  in  the  second  level  of  the  tree  corresponds  to  all  possible 
matches  for  the  first  two  data  features. 
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Figure  1.  We  can  build  a  tree  of  possible  interpretations,  by  first  considering  all  the  ways 
of  matching  the  first  data  feature,  /j,  to  each  of  the  model  features,  Fj,j  =  1, . . . ,  m. 


Figure  2.  For  each  pairing  of  the  first  data  feature  with  a  model  feature,  we  can  consider 
matchings  for  the  second  data  feature  with  each  of  the  model  features.  Each  node  in  the 
second  level  of  the  tree  defines  a  pairing  for  the  first  two  data  points,  found  by  tracing  up 
the  tree  to  the  nodes.  An  example  is  shown. 

We  can  continue  in  this  manner,  adding  new  levels  to  the  tree,  one  for  each 
data  feature.  A  node  of  the  interpretation  tree  at  level  n  describes  a  partial  n- 
interpretation,  in  that  the  nodes  lying  directly  between  the  current  node  and  the 
root  of  the  tree  identify  an  assignment  of  model  features  to  the  first  n  data  features. 
Any  leaf  of  the  tree  defines  a  complete  s-interpretation,  where  s  is  the  total  number 
of  data  features. 

Our  goal  is  to  find  consistent  Jfc-interpretations,  where  k  is  as  large  as  possible, 
k  <  s,  and  to  find  these  interpretations  with  as  little  effort  as  possible.  A  simple- 
minded  method  would  examine  each  leaf  of  the  tree,  testing  to  see  if  there  exists  a 
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rigid  transformation  mapping  each  model  feature  into  its  associated  data  feature. 
This  is  clearly  too  expensive,  as  it  simply  reverts  to  an  exploration  of  the  entire, 
exponential-size,  search  space.  A  better  solution  is  to  explore  the  interpretation  tree, 
starting  at  its  root,  and  testing  interpretations  as  we  move  downward  in  the  tree.  As 
soon  as  we  find  a  node  that  is  not  consistent,  i.e.  for  which  no  rigid  transform  will 
correctly  align  model  and  data  feature,  we  terminate  any  further  downward  search 
below  that  node,  as  adding  new  data-model  pairings  to  the  interpretation  defined 
at  that  node  will  not  turn  an  inconsistent  interpretation  into  a  consistent  one. 

In  testing  for  consistency  at  a  node,  we  have  two  different  choices.  We  could 
explicitly  solve  for  the  best  rigid  transformation,  and  test  that  all  of  the  model  fea¬ 
tures  do  in  fact  get  mapped  into  agreement  with  their  corresponding  data  features. 
This  approach  has  two  drawbacks.  First,  computing  such  a  transformation  is  gen¬ 
erally  computationally  expensive  (however,  see  [Faugeras  &  Hebert  1986,  Ayache 
&  Faugeras  1986]  for  an  efficient  method  for  updating  transformations),  and  we 
would  like  to  avoid  any  unnecessary  use  of  such  a  computation.  Second,  in  order  to 
compute  such  a  transformation,  we  will  need  an  interpretation  of  at  least  k  data^ 
model  pairs,  where  k  depends  on  the  characteristics  of  the  features.  This  means  we 
must  wait  until  we  are  at  least  k  levels  deep  in  the  tree,  before  we  can  apply  our 
consistency  test,  and  this  increases  the  amount  of  work  that  must  be  done. 

Our  second  choice  is  to  look  for  less  complete  methods  for  testing  consistency. 
We  instead  seek  constraints  that  can  be  applied  at  any  node  of  the  interpretation 
tree,  with  the  property  that  while  no  single  constraint  can  uniquely  guarantee  the 
consistency  of  an  interpretation,  each  constraint  can  rule  out  some  interpretations. 
The  hope  is  that  if  enough  independent  constraints  can  be  combined  together,  their 
aggregation  will  prove  powerful  in  determining  consistency,  but  at  a  lower  cost  than 
fully  solving  for  a  transformation. 

In  previous  work,  we  developed  a  set  of  unary  and  binary  constraints  that  can 
be  applied  to  this  problem  [Crimson  &  Lozano-Perez  1984,  1987].  For  example,  if 
we  are  matching  edge  segments  from  a  grey-level  image,  one  unary  constraint  is  that 
the  length  of  the  data  edge  must  not  be  longer  than  the  corresponding  model  edge, 
plus  some  bounded  amount  of  error.  Binary  constraints  apply  to  pairs  of  data-model 
pairings,  for  example,  the  angle  between  two  data  edges  must  be  roughly  the  same 
as  the  angle  between  the  corresponding  model  edges,  and  the  range  of  distances 
between  a  pair  of  data  edges  must  be  contained  within  the  corresponding  range  of 
distances  for  a  pair  of  model  edges,  adjusted  for  error,  and  so  on.  Hence,  if  a  unary 
constraint,  applied  to  such  a  pairing,  is  true,  then  this  implies  that  the  data-model 
pairing  may  be  part  of  a  consistent  interpretation.  If  it  is  false,  however,  then  that 
pairing  cannot  possibly  be  part  of  such  an  interpretation.  Binary  constraints  apply 
to  pairs  of  data-model  pairings,  with  the  same  logic.  These  kinds  of  constraints  have 
the  advantages  of  computational  simplicity,  while  retaining  considerable  power  to 
separate  consistent  from  inconsistent  interpretations,  and  of  applicability  at  virtually 
any  node  in  the  interpretation  tree. 

Formulated  in  this  way.  our  approach  to  recognition  ran  be  considered  as  a 
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problem  of  constraint  satisfaction,  or  consistent  labelling,  a  problem  that  has  re¬ 
ceived  considerable  attention  in  the  Artificial  Intelligence  literature  [e.g.  Freuder 
1978,  1982,  Gaschnig  1979,  Haralick  &  Elliot  1980,  Haralick  &  Shapiro  1979,  Mack- 
worth  1977,  Mackworth  &  Freuder  1985,  Montanari  1974,  Nudel  1983,  Waltz  1975]. 
When  we  analyze  the  performance  of  our  system,  we  will  use  results  from  this  liter¬ 
ature  to  guide  our  development. 

To  use  these  constraints,  we  must  now  specify  a  means  of  exploring  the  interpre¬ 
tation  tree.  We  do  this  using  back-tracking  depth-first  search.  (See  Figure  3.)  That 
is,  we  begin  at  the  root  of  the  tree,  and  explore  downwards  along  the  first  branch.  At 
each  node,  we  check  the  unary  constraints  applicable  to  the  new  data-model  pairing, 
and  we  check  the  n  —  1  sets  of  binary  constraints  obtained  by  considering  the  new 
data-model  pairing  relative  to  each  data-model  pairing  defined  by  an  ancestor  node. 
If  all  these  constraints  are  consistent,  then  we  continue  downwards  in  the  search.  If 
one  of  them  is  inconsistent,  we  backtrack  to  the  previous  node.  We  then  explore 
the  next  branch  of  that  node.  If  there  are  no  more  branches,  we  backtrack  another 
level,  and  so  on.  Note  that  the  number  of  constraints  increases  as  we  go  lower  in  the 
tree,  and  hence  the  likelihood  that  the  interpretation  is  in  fact  globally  consistent 
increases. 


Figure  3.  The  tree  is  searched  in  a  depth-first,  backtracking  manner,  starting  at  the  root 
If  a  node  is  found  to  be  inconsistent,  the  downward  search  is  terminated,  and  we  backtrack. 
Any  leaf  of  the  tree  that  is  reached  by  the  search  constitutes  a  hypot  hesized  interpretation. 
The  darker  edges  in  the  diagram  indicate  one  example  of  a  backtracking  search. 

If  we  reach  a  leaf  of  the  tree,  we  have  a  possible  interpretation  of  the  data  relative 
to  the  model,  which  we  can  verify  by  solving  for  a  rigid  transformation  and  testing 
that  it  does  take  all  of  the  model  features  into  rough  agreement  with  their  associated 
data  features.  Even  if  we  do  reach  a  leaf  of  the  tree,  we  do  not  abandon  the  search. 
Rather,  we  accumulate  that  possible  interpretation,  back-track  and  continue,  until 
the  entire  tree  has  been  explored,  and  all  possible  interpretations  have  been  found. 
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As  described,  our  search  method  will  succeed  only  when  all  of  the  data  features 
come  from  the  object  of  interest.  In  general,  object  recognition  must  also  work  in 
the  presence  of  clutter  in  the  scene,  in  which  much  of  the  object  may  be  hidden  from 
view,  and  in  which  much  of  the  data  is  spurious,  coming  from  other  objects.  The 
tree  search  method  can  be  straightforwardly  extended  to  handle  this  by  introducing 
into  our  matching  vocabulary  a  new  model  feature,  called  a  null  character  feature. 
At  each  node  of  the  interpretation  tree,  we  add  as  a  last  resort  an  extra  branch 
corresponding  to  this  feature  (see  Figure  4).  This  feature  (denoted  by  a  *  to  dis¬ 
tinguish  it  from  actual  model  features  Fj)  indicates  that  the  data  point  to  which  it 
is  matched  is  to  be  excluded  from  the  interpretation,  and  treated  as  spurious  data. 
To  complete  this  addition  to  our  matching  scheme,  we  must  define  the  consistency 
relationships  between  data-model  pairings  involving  a  null  character  match.  Since 
the  data  point  is  to  be  excluded,  it  cannot  affect  the  current  interpretation,  and 
hence  any  constraint  involving  a  data  point  matched  to  the  null  character  is  deemed 
to  be  consistent. 


Figure  4.  The  interpretation  tree  can  be  extended  by  adding  the  null  character  *  as  a  final 
branch  for  each  node  of  the  tree.  A  match  of  a  data  feature  and  this  character  indicates 
that  the  data  feature  is  not  part  of  the  current  interpretation.  In  the  example  shown,  the 
simple  tree  of  Figure  2  has  been  extended  to  include  the  null  character. 


2.  Previous  results 

This  method  has  been  used  for  recognition  in  a  variety  of  domains  [Grimson  & 
Lozano-Perez  1984,  1987,  Gaston  &  Lozano-Perez  1984,  Murray  1987a,  1987b,  Mur¬ 
ray  &  Cook  1988,  Drumheller  1987].  Our  empirical  experience  was  that  the  method 
was  very  efficient  when  all  of  the  data  features  are  known  to  have  come  from  a  single 
object.  When  spurious  data  is  included,  however,  the  method  slows  down  by  several 
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orders  of  magnitude.  If  methods  for  preselecting  subspaces  of  the  search  space,  such 
as  the  generalized  Hough  transform  [Ballard  1981],  are  added,  the  method  improves 
in  efficiency.  By  preselection,  we  mean  that  only  some  subset  of  the  possible  data- 
model  pairings  are  used  in  the  search  process,  and  typically  such  subsets  are  chosen 
based  on  an  expectation  that  they  give  rise  to  similar  transformations  of  the  model. 
If  premature  termination  is  added  (i.e.  halting  the  search  process  as  soon  as  an 
interpretation  that  is  “good  enough”  is  found),  the  method  improves  even  further. 

In  an  earlier  combinatorial  analysis  [Crimson  1989],  we  showed  that  some  of 
these  empirical  observations  were  supported  by  formal  analysis.  The  main  points  of 
this  analysis  are  summarized  below,  formed  statements  of  the  main  propositions  are 
included  in  the  appendix  for  completeness. 

•  When  all  of  the  data  features  are  known  to  have  come  from  a  single  object,  the 
number  of  interpretations  is  asymptotic  to  1. 

•  When  only  c  of  the  s  data  features  come  from  an  object  with  m  model  features, 
the  number  of  interpretations  n*  is  bounded  above  by  an  expression  of  order 

0(n*)  =  T  +  [1  4-  o]s  +  2ms[l  +  p2]c 

where  p2  is  the  probability  of  a  pair  of  random  data-model  pairings  satisfying 
binary  consistency,  and  a  is  a  small  (<  1)  constant  that  depends  on  the  object 
characteristics  and  the  amount  of  noise  in  the  measurements.  The  number  of 
interpretations  is  bounded  below  by  an  expression  of  order 
o(n;)  =  2c  +  [l+/3]s  +  2ms[l+p2]c. 

•  The  expected  probability  of  two  random  data-model  pairings  being  consistent 
Pi  is  given  by 


where  k  is  a  constant  (usually  less  than  1)  that  can  be  derived  from  properties 
of  the  object  and  noise  characteristics.  The  appendix  provides  details. 

•  If  all  s  sensory  measurements  are  know"  to  lie  on  a  single  object  with  m  equal 
sized  features,  the  sensory  data  is  distributed  uniformly,  and  if  the  noise  is  small 
enough,  then  the  expected  amount  of  search  reeded  to  find  the  interpretation 
is  bounded  by 

Tn2  <  Ns  <  m2  +  ams 

where  a  is  a  constant  that  depends  on  the  object  characteristics  and  the  amount 
of  noise  in  the  sensory  measurements. 

•  If  Co  of  the  s  sensory  measurements  lie  on  an  object  with  m  equal  sized  features, 
the  sensory  data  is  distributed  uniformly,  and  if  the  noise  is  small  enough,  then 
the  expected  amount  of  search  needed  to  find  the  interpretations,  is  bounded 
by  above  by  an  expression  of  order 

0(N;)  =  m[l  +  7]s  +  ms'2c°  +  Sm6  +  m2s2[  1  +  p}Co 
and  is  bounded  below  by  an  expression  of  order 

o(  l V  * )  =  m2c°  +  ms 
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where  7 ,6  axe  constants  that  depend  on  the  object  characteristics  and  the 
amount  of  sensor  noise,  7  <  1. 

As  we  suggested  in  the  introduction,  these  results  show  that  constrained  search  is 
polynomial,  in  fact  quadratic,  when  all  of  the  data  is  known  to  come  from  a  single 
object,  but  is  exponential  when  spurious  data  is  included.  One  way  of  reducing  this 
exponential  cost  is  to  terminate  the  search  as  soon  as  an  interpretation  is  found 
that  is  “good  enough”.  In  this  paper,  we  consider  the  effects  of  this  heuristic  on  the 
search  process. 

3.  Setting  up  the  termination  model. 

We  define  premature  termination  to  be  the  process  of  stopping  the  search  when  an 
interpretation  is  found  that  is  “good  enough”.  We  define  our  measure  of  goodness 
to  be  the  number  of  data  features  included  in  an  interpretation  that  are  matched 
to  a  real  model  feature,  and  not  the  null  character.  Other  definitions  are  possible, 
such  as  the  fraction  of  an  object’s  perimeter  that  is  accounted  for  by  the  data,  but 
for  our  purposes  the  simple  counting  of  features  suffices.  Thus,  we  set  a  threshold 
on  the  size  of  an  interpretation,  and  we  will  terminate  the  search  as  soon  as  we 
find  a  valid  interpretation  of  that  size.  In  [Grimson  and  Huttenlocher  1989],  we 
consider  the  problem  of  how  to  properly  select  such  a  threshold  so  that  there  are  no 
expected  false  positives.  Here,  we  simply  assume  that  any  interpretation  exceeding 
the  threshold  is  a  correct  one. 

To  see  how  termination  can  reduce  the  search  process,  consider  a  simple  exam¬ 
ple.  Suppose  we  have  a  scene  with  s  =  6  features,  a  model  with  m  =  2  features,  and 
a  threshold  of  t  =  3.  In  principle,  the  constrained  search  method  would  examine  a 
tree  of  depth  6,  the  kth  level  of  which  would  have  (m  +  l)fc  nodes  to  be  examined, 
for  a  total  of 

(m+  l),+  l  -  1 
m 

different  nodes.  Of  course,  many  of  these  nodes  would  not  be  examined  because 
ancestor  nodes  in  the  tree  would  be  inconsistent  with  the  constraints,  and  the 
subsequent  subtree  could  be  pruned.  Nonetheless,  consider  what  happens  when 
a  threshold  on  search  is  included. 

For  simplicity,  we  consider  the  subtree  below  a  node  at  the  first  level  of  the 
tree.  In  this  case,  there  are  in  principle 

=  364 

2 

nodes  to  be  explored  in  this  subtree.  In  Figure  5,  we  show  the  subtree  under  a 
node  on  the  first  level  of  the  tree  that  would  be  searched  when  a  threshold  on 
interpretation  length  is  used  to  prune  the  tree.  Notice  that  once  we  reach  a  node 
with  t  assignments  of  data  features  to  actual  model  features  (i.e.  not  to  the  null 
character),  we  can  terminate  further  downward  search.  Similarly,  once  we  reach  a 
node  for  which  it  is  not  possible  to  obtain  a  t  interpretation,  no  matter  what  happens 
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to  the  remaining  data  features,  we  can  again  terminate  further  downward  search. 
In  the  case  shown  in  the  figure,  only  64  nodes  need  to  be  explored,  almost  an  order 
of  magnitude  decrease  in  effort.  As  in  the  normal  case,  some  of  the  nodes  will  not 
be  reached  due  to  inconsistencies  in  the  constraints,  but  we  can  clearly  see  that  in 
principle  the  number  of  nodes  to  be  explored  is  reduced  from  the  straightforward 
case. 


Figure  ■'5  The  portions  of  the  interpretation  tree  under  the  first  node  that  need  to  be 
explored  using  search  termination.  In  this  case,  we  have  set,  m  =  2,  s  =  6  and  t  =  3.  The 
circles  indicate  nodes  actually  explored. 


4.  The  formal  model 

We  will  derive  results  on  the  effects  of  prematurely  terminating  the  search  process  in 
several  steps.  We  begin  by  defining  a  formal  model  for  the  probability  of  consistency 
of  a  node  in  the  tree.  Given  that  model,  we  derive  an  explicit  expression  for  the 
expected  number  of  nodes  searched  in  a  tree.  We  then  bound  this  expression,  and 
use  these  bounds  to  derive  simpler  order  of  growth  bounds  on  the  expected  search. 
These  are  summarized  in  the  corollaries  to  Proposition  3,  in  which  we  show  that  the 
expected  search  is  cubic  in  the  parameters  of  the  problem. 

We  begin  with  the  formal  model  for  consistency.  Since  our  method  uses  both 
unary  and  binary  constraints,  we  need  to  model  the  probability  that  a  data-model 
assignment  is  consistent  and  the  probability  that  a  pair  of  data-model  assignments 
are  consistent. 

Similar  to  our  earlier  analysis  [Grimson  1989],  we  let  qij  denote  the  probability 
that  assigning  the  ith  data  element  to  the  Ith  model  element  is  consistent,  and 
we  let  denote  the  probability  that  the  pair  of  assignments  i  I ,j  J  is 

consistent.  Our  model  of  the  recognition  problem  is  defined  as  follows. 

For  a  single  data-model  pairing,  if  the  pairing  is  part  of  the  correct  interpreta¬ 
tion,  the  probability  of  consistency  is  simply  1.  Similarly,  any  pairing  involving  the 
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null  character  is  consistent  with  probability  1.  If  the  pairing  is  not  correct,  we  let 
the  probability  of  consistency  be  p\.  Thus,  we  have 

1  if  i  • — ►  /  is  correct 
qij  =  1  if  /  is  the  null  character, 

i  pi  otherwise. 

For  a  pair  of  assignments,  suppose  we  are  considering  a  match  in  which  data 
fragments  i,j  are  paired  with  model  fragments  I,J  respectively.  We  will  model  the 
situation  by  saying  that  the  consistency  of  this  pair  of  pairs  has  probability  1  if 
-these  pairings  are  part  of  the  correct  interpretation,  or  if  either  of  then  is  assigned 
to  the  null  character.  Otherwise  we  will  assume  that  the  probability  of  consistency 
is  p2-  Note  that  this  is  essentially  assuming  a  random  distribution  of  edges.  It  is 
also  assuming  that  pairs  of  model  edges  are  distinctive,  so  that  objects  with  partial 
symmetries  are  excluded.  Thus,  we  have 

1  if  i  i — ►  I,j  *  J  is  correct 

|  1  if  either  I  or  J  are  the  null  character, 

P2  otherwise. 

Given  a  partial  interpretation  at  a  node,  the  probability  of  consistency  is  given 

by 

» 

We  can  use  the  above  definitions  for  q  to  derive  an  explicit  expression  for  the  ex¬ 
pected  number  of  nodes  in  the  tree. 


Proposition  1:  Assume  that  the  data  features  that  actually  arise  from  the 
object  of  interest  are  uniformly  interspersed  among  the  spurious  features,  occuring 
with  frequency 

Assume  we  are  given  a  partial  interpretation  based  on  l  —  1  data  features,  of  which  u 
are  correctly  assigned,  the  remaining  t  -  u  —  1  being  matched  to  the  null  character. 
If  we  assign  the  next  data  feature  to  a  real,  but  incorrect,  model  feature,  then 
the  number  of  nodes  below  this  point  in  the  tree  that  will  on  average  be  explored, 
denoted  by  W(s,u,  l),  is  given  by 

.•  i-i«i  c+“+,)-(*+j*.j) 

Pl  2_,  P2 

k=0  1=0  '  ' 

k  =  t  —  u  Yt=max{0, t—  a+t+k— u  — 1}  ' 

+  ‘"if  . 

- - /ft*  N  '  I 


+i)-^.+i)in('+r2)-(“+l^+,,J) 
P  2 


+  E 

-  niM  {o  ,t  -  *+ r+  k  -  U  } 
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A  proof  of  this  proposition  is  deferred  to  the  Appendix. 

We  can  check  the  correctness  of  this  expression  by  setting  p\  and  P2  to  1. 
Applying  the  resulting  expression  to  the  case  of  u  =  0,i  =  l,m  =  2,  s  =  6,t  =  3,  as 
shown  in  Figure  5,  yields  the  correct  result  of  64  nodes. 

We  can  use  Proposition  1  to  establish  bounds  on  the  search  of  such  a  subtree. 
This  is  done  in  the  following  proposition: 


Proposition  2:  The  expression  of  Proposition  1  can  be  bounded  by: 
W(s,u,i)  <  PlP%  [(t  -  u)j°(u)+Vo(u)~i 

+  (s  —  l  —  t  +  u)(t  -  u  —  1)  [(s  -  £  -  2)p],0*u's_*_1)  x 


( 


x  1  4-  mp\  °p2 


1-6  2“+! - T - 


and  by 


W(s 


r  a 

,  u,t)  >  P\p\u  [3  —  t  +  u  —  l  +  2]  1  -  (Z  —  u)mp\~  p* 


—  3)  —  ^*-f2 


where 


p  =  mp\  sp[{u) 
/(«)  = 

,  _  i  (<-«)/*-  1  i 

L  1+7*"  J 

,  (k  -  l)v  -  1 , 

10  =  i“i+r" J 


i~«, 


v  =  mp\  “p2 


The  previous  claim  gives  us  a  lower  bound  on  the  expected  search  of  a  particular 
subtree.  How  do  we  use  it  to  bound  the  search  of  the  whole  tree?  Under  the 
assumption  that  the  c  correct  data  features  are  uniformly  interspersed  throughout 
the  full  set  of  s  data  features,  we  can  see  that  at  the  top  level  of  the  tree,  we  must 
search  m  subtrees,  with  u  =  0,i  =  1,  that  is,  for  each  possible  assignment  of  the  first 
data  feature  to  a  real  model  feature,  we  must  explore  the  appropriate  subtree.  Since 
the  first  data  feature  is  not  part  of  the  true  object,  once  we  have  exhausted  these 
subtrees,  we  must  move  on  to  interpretations  that  exclude  the  first  data  point,  by 
considering  the  portion  of  the  tree  below  the  node  that  pairs  the  first  data  point  to 
the  null  character.  Under  this  node,  we  consider  pairings  of  the  second  data  feature. 
Again,  we  must  consider  m  subtrees,  with  u  =  0,(  =  2.  We  continue  this  process 
until  we  reach  level  £  =  In  this  case,  we  have  a  data  feature  that  does  have  a 
correct  match,  and  on  average  this  will  be  found  after  we  have  searched  y  subtrees 


12 


at  this  level.  We  then  repeat  this  process  below  this  node  in  the  tree,  with  u  =  1, 
and  so  on.  Hence,  the  expected  total  amount  of  search  is  given  by: 


1 

^2  mW(s,j, -j  + 1)  +  1 

•=i 


+  jW(s,j±(j +!))  +  ! 


To  obtain  bounds  on  this  expression,  we  simply  need  to  substitute  from  Proposition 
2,  and  simplify. 


Proposition  3:  Given  a  uniform  distribution  of  correct  data  features  among 
the  spurious,  and  given  the  previously  derived  expression  for  the  binary  probability 
of  consistency: 


P2  = 


the  total  amount  of  search  expected  is  bounded  by 


t-'o+V0-1  ^i  +  (f- 

1  -P2 


+  l'°c(t  -  1) 


+  P 


l-p*3-^ 


.1-1*  '  "1  -P?-* 


+/? 


[Pr^l-P^3-^) 

tP?-S)t  ] 

'll 

(1-p3-^ 

1-P3-4. 

JJ 

where 


7  =  (s  -  3)// 

*o  =  [(«  -  3 )p  -  lj 
jo  =  |k2  -  lj 

,  ,  3-(3- 

0  =  mp\~6p2  5 

‘-‘-‘"KH 


and  by 


W(s)  >  7  +  mp! 

C> 


'-«-«• 


,  1  -  Pi*  .pid-pj1)  .  Kpj' 
1-P2  ( 1  —  p>  )2  1-P2 


/l-P(23_4)f  ,  P2 

+<*  [  a - — — — - b 


l-p(23-4) 


( 1  ~  Pi 


,(3-«))2 


)+  6tp^ 


i-Prr 
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where 


6=(H(H) 


3£~£r4-2 


a  =  mp\  sp'2  7 


The  key  results  follow  from  this  proposition. 


Corollary  3.1:  The  order  of  magnitude  of  the  expected  search  is  given  by: 


and  by 


o(W(s))  =  ms- 


_  .....  . .  s  f  k2  \ 2  /  2  s  \  lit*2-1! 

0(W^(s))  =  amts-(l  +  — )  (*2  — ) 

c  \  m  J  \  m  / 


where  a  is  a  small  constant. 

Proof:  For  both  bounds,  we  can  simply  identify  the  dominant  term,  substitute 
in  the  bounds  on  the  variables,  yielding 


and 


°W5))  =  + 

2 ^  2  '  *  '  L^*2-1! 


(19) 


0(W'M)  =  m((»-1-i-£)i(l  +  £) 

The  simpler  expressions  follow.l 


Corollary  3.2:  If  the  scene  clutter  and  the  noise  in  the  data  is  such  that 

s  2 

—  <  “7 
m  k1 

then  premature  termination  has  an  expected  search  that  is  of  order 

0(W(s))  =  amts- 
c 

and 

o(VF(s))  =  ms~- 

Proof:  If  the  conditions  hold,  then  the  exponent  in  the  previous  corollary  becomes 
0  and  the  upper  bound  on  the  search  reduces  to 


The  simplification  follows.| 
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5.  Implications  of  the  results 

Several  interesting  conclusions  may  be  drawn  from  the  <*!>ove  analysis.  First,  we  note 
from  Corollary  3.1,  that  we  cannot  guarantee  that  terminated  search  will  result  in 
a  polynomial  time  algorithm,  as  the  upper  bound  is  still  exponential.  At  the  same 
time,  however,  the  search  is  clearly  reduced,  as  both  the  base  and  the  exponent  are 
much  reduced. 

Corollary  3.2  provides  a  fascinating  result,  however,  since  it  indicates  conditions 
on  the  problem  under  which  the  expected  search  does  become  polynomial,  basically 
implying  that  if  the  scene  clutter  is  small  enough,  a  polynomial  algorithm  results. 
This  has  two  interesting  implications.  If  the  number  of  scene  features  is  small  enough 
relative  to  the  size  of  the  model,  it  implies  that  terminated  search  will  perform  well. 
When  the  scene  clutter  increases,  however,  we  must  provide  some  form  of  grouping 
or  selection  to  reduce  the  number  of  scene  features  actually  considered  in  the  search 
below 

2m 

Here,  selection  means  isolating  a  subset  of  the  data  features  most  of  which  are 
believed  to  have  come  from  a  single  instance  of  a  known  object. 

This  nicely  extends  our  earlier  results  on  the  role  of  selection  in  efficient  object 
recognition.  The  results  of  [Grimson  1989]  imply  that  for  pure  constrained  search, 
knowing  that  all  of  the  data  features  are  from  a  given  object  will  reduce  the  expected 
search  to  the  polynomial  domain,  but  general  constrained  search  remains  exponen¬ 
tial.  This  suggests  that  our  selection  mechanism  must  be  very  accurate  at  selecting 
out  subsets  of  the  data  features  for  consideration,  since  if  even  one  spurious  point 
is  included  we  must  either  use  an  exponential  search  method,  or  tolerate  having  the 
entire  subset  of  data  features  being  rejected.  When  premature  search  termination  is 
added,  however,  Corollary  3.2  implies  that  we  can  tolerate  considerably  more  uncer¬ 
tainty  on  the  part  of  the  selection  process  and  still  have  an  efficient  search  method. 
We  simply  require  that  the  selection  method  allows  an  amount  of  spurious  data  that 
is  bounded  by  the  conditions  of  Corollary  3.2. 

Also  note  that  both  Proposition  3,  and  its  Corollaries,  involve  the  constant 
k,  which  is  determined  by  properties  of  the  object  model  and  the  sensing  system. 
In  particular,  k  increases  with  increasing  noise  in  the  sensory  data,  and  this,  as 
expected,  implies  both  that  the  amount  of  expected  search  will  increase,  and  that 
the  amount  of  spurious  data  that  can  be  tolerated,  while  maintaining  a  polynomial 
algorithm,  decreases.  Standard  values  for  k  are  on  the  order  of 


where  P  is  the  total  perimeter  of  the  object  (for  the  case  of  2D  objects)  and  D  is 
the  dimension  of  the  image.  Given  this,  we  see  that  our  conditions  for  a  polynomial 
search  are  that 


2m  (  D\‘ 

&  -  Mm  (?) 
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so  that  if  the  object  is  of  a  size  on  the  order  of  the  image  ( P  ~  D),  considerable 
amounts  of  spurious  data  are  still  tolerable,  while  maintaining  a  polynomial  search 
algorithm. 

5.1  Comparing  search  results 

To  more  directly  compare  the  results  derived  here,  we  can  consider  some  earlier 
analysis  of  constrained  search  in  object  recognition.  In  [Grimson,  1989],  we  analyzed 
the  combinatorial  behavior  of  the  constrained  search  approach,  and  show  two  major 
results.  The  first  is  that  if  all  of  the  data  are  known  to  have  come  from  a  single 
object,  so  that  we  need  not  use  the  null  character  to  exclude  spurious  data,  then 
the  amount  of  search  was  bounded  by 
m2  <  Wno 

—  occ  <  p^m2  +  (1  +  Pik)  2  ms. 

Hence  the  search  process  is  polynomial  in  this  case. 

If,  however,  spurious  data  is  included,  we  showed  that  the  search  is  exponential. 
In  our  earlier  analysis,  we  did  not  use  any  assumptions  on  the  distribution  of  the 
correct  sensory  data  features  in  the  search  process.  To  more  directly  compare  the 
two  methods,  below  we  derive  bounds  on  the  constrained  search  process  under  the 
assumption  of  uniform  distribution  of  the  correct  data  features. 


Proposition  4:  If  the  sensory  data  arising  from  a  correct  interpretation  are 
uniformly  distributed  among  the  spurious  data,  then  the  amount  of  search  expended 
by  the  normal  constrained  search  method  is  bounded  by 


m-2c  <  Wocc  <  m-2c  +  -  [1  +  e]a 
c  c  e  1  1 


1  + 


Pi 


-l  c— 1 


1  +  € 


m3s  s 


k2  c 


-Ji  +  PiV. 


With  these  results,  it  is  clear  that  premature  termination  of  the  search  process 
can  significantly  reduce  the  work  involved  in  locating  an  object.  From  Corollary 
3.2,  we  know  that  if  the  scene  clutter  is  small  enough,  the  expected  search  reduces 
to  order 

S  3 

ms-  <  Wterm  <  mts- . 

C  C 

This  is  clearly  signficantly  smaller  than  the  expressions  in  Proposition  4. 

As  a  consequence,  the  main  conclusion  we  can  draw  is  that  premature  termi¬ 
nation  of  a  constrained  search  method  can  dramatically  reduce  the  expected  search 
required  to  recognize  and  locate  objects  in  cluttered  noisy  data.  To  obtain  poly¬ 
nomial  time  algorithms  for  recognition,  we  must  keep  the  ratio  of  scene  clutter 
to  object  size  below  a  well  defined  bound,  and  this  implies  that  for  significantly 
cluttered  scenes,  some  type  of  grouping  or  selection  mechanism  is  needed  to  select 
out  subsets  of  the  data  features  that  are  likely  to  include  a  subset  arising  from  an 
instance  of  a  known  object. 
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Appendix 

In  this  appendix,  we  present  formal  proofs  of  the  propositions  stated  in  the  main 
text. 

We  begin  by  stating  the  main  propositions  from  earlier  analysis  in  Grimson 
[1989].  (Note  that  the  numbers  of  the  propositions  refer  to  the  numbers  used  in 
that  article.)  In  that  analysis,  we  first  derived  bounds  on  the  number  of  consistent 
interpretations,  both  in  the  case  of  data  known  to  have  come  from  a  single  object, 
and  in  the  case  of  spurious  data. 


Proposition  1  [Grimson,  1989]:  If  all  of  the  k  sensory  measurements  are 
known  to  lie  on  a  single  object  with  m  features,  then  the  number  of  interpretations 
n/c  is  bounded  by 

nk<  [l  +  (m  -  ^PipT5-]*- 

and  by 

"fc  >  1  +  [p|  +  pi(m  -  l)j  p2'  J~  }  -  p £■ 

where  pi  is  the  probability  of  a  random  data-model  assignment  satisfying  unary 
consistency,  and  p2  is  the  probability  of  a  pair  of  random  data-model  assignments 
satisfying  binary  consistency.! 


Proposition  2  [Grimson,  1989]  :  Given  an  object  with  m  faces  and  given 
k  sensory  data  points,  of  which  c  actually  lie  on  the  object,  the  number  of  interpre¬ 
tations  n*k  is  bounded  by 

<  2C  -  [1  +p2]C  +  [l  +  mpip^]*“c[p2  +  1  +  mpip|]c 
+  mpi  [l  -  p2^~]  [l  +  pj] C_1  [k  +  p2(k  -  c)] 

and  by 

nt>  2C-  [l  +  p2‘ipS]C+  [l  +  (m-l)p1p2"fi]fc-c[l-f-(m-l)p1pjfl  +  p^]c 

+  Pi(m  -  l)[l  +P2]C-1[*  +  Mk~  c)] 

-  Pi(m  -  ljp^5-  [l  +  Pr3”]01  +PiT' (k  ~  c)] 
where  pi  is  the  probability  of  a  random  data-model  assignment  satisfying  unary 
consistency,  and  P2  is  the  probability  of  a  pair  of  random  data-model  assignments 
satisfying  binary  consistency.! 
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To  obtain  order  of  magnitude  expressions  on  the  amount  of  search  required  to 
find  these  interpretations,  we  need  to  relate  the  probability  of  consistency  to  aspects 
of  the  problem.  We  established  that  the  probability  of  consistency  is  inversely 
proportional  to  the  number  of  model  features,  for  a  fixed  amount  of  sensor  noise 
and  a  fixed  size  object  model: 


Proposition  3  [Grimson,  1989]:  Given  a  two  dimensional  object  with  m 
equal  sized  edges  of  length  L,  and  given  sensory  data  that  is  distributed  uniformly 
in  transform  space  with  a  uniform  distribution  of  lengths,  the  expected  probability 
of  two  random  data- model  pairings  being  consistent,  p2,  is  given  by 


where 


K  —  K% u  —  ^ 

1 4ca 

TV 

^(Cp)2  +2c;(l  -  h*) 

+  sinfa(i  h*)2 

7 r 

p 

15 

in  the  worst  case,  and 

II 

a 

sc 

II 

SC 

/llfL 

1  TV 

*(e;)2  +  <;(i  -  *•) 

p' 

D 

in  the  uniform  distribution  case,  and  where  ca  is  a  bound  on  the  error  in  measuring 
orientation,  ep  is  a  bound  on  the  error  in  measuring  position,  h  is  the  minimum 
length  data  edge,  cp  =  ^ ,  ft*  =  j,  P  is  the  perimeter  of  the  object,  and  D  is  the 
dimension  (width)  of  the  image.| 


To  illustrate  the  range  of  values  for  this  constant,  in  Figure  6,  we  list  the  values 
for  ku  for  a  range  of  values  of  e*  and  a  range  of  values  of  P/ D./  We  fix  h*  =  2e* 
and  ea  =  tan-1  2c*.  As  expected,  the  constant  increases  with  increasing  noise,  and 
as  the  size  of  the  object  increases. 


P/D  = 

.125 

.25 

.5 

1 

2 

4 

8 

.002 

.004 

.008 

.016 

.033 

.065 

.131 

1 

.021 

.042 

.085 

.169 

.338 

.677 

1.354 

BB 

.111 

.222 

.443 

.886 

1.772 

3.545 

7.090 

Figure  6.  Values  for  the  constant  ku  for  a  range  of  values  of  €p  and  a  range  of  values  of 
P/D./  We  fix  h*  =  2tp  and  e0  =  tan-1  2 cp. 

A  similar  result  holds  for  three  dimensional  objects.  This  result  can  be  used  to 
establish  the  following  two  sets  of  bounds  on  the  amount  of  search  involved. 


Proposition  8  [Grimson,  1989]:  If  all  of  the  s  sensory  measurements  are 
known  to  lie  on  a  single  two-dimensional  object  with  m  equal  sized  edges  of  length  £, 
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m  >  3,  the  sensory  data  is  distributed  uniformly  in  transform  space,  with  a  uniform 
length  distribution,  and  if  the  noise  is  small  enough,  then  the  expected  amount  of 
search  needed  to  find  the  interpretation  is  bounded  by 

m2  <  N,  <  m2  4-  ams 

where  a  is  a  constant  that  depends  on  the  object  characteristics  and  the  amount  of 
noise  in  the  sensory  measurements.! 

Proposition  9  [Grimson,  1989]:  If  cq  of  the  k  sensory  measurements  lie  on 
a  two-dimensional  object  with  m  equal  sized  edges  of  length  L,  the  sensory  data  is 
distributed  uniformly  in  transform  space,  with  a  uniform  length  distribution,  and  if 
the  noise  is  small  enough,  then  the  expected  amount  of  search  needed  to  find  the 
interpretations,  for  m  large,  is  bounded  by 

N‘  -  m  ~ +  2C°(S  ~  co  +  1] 

+*».[£ +!!+«]“[(')  -  (j)  + 

N;  >  m  2Co+1  +  s  -  c0  -  3] 

where 


and  where  k  is  a  constant  the  depends  on  the  object  characteristics  and  the  amount 
of  sensor  noise,  and  px  is  the  probability  of  a  random  data^model  assignment  satis¬ 
fying  unary  consistency.! 

Given  these  results  as  a  basis,  the  text  of  the  paper  presents  a  similar  analysis  for 
the  case  of  premature  termination.  The  main  results,  with  proofs,  are  summarized 
below. 


Proposition  1:  Assume  that  the  data  features  that  actually  arise  from  the 
object  of  interest  are  uniformly  interspersed  among  the  spurious  features,  occuring 
with  frequency 

s 

Assume  we  are  given  a  partial  interpretation  based  on  t  -  1  data  features,  of  which  u 
are  correctly  assigned,  the  remaining  l  —  u  —  1  being  matched  to  the  null  character. 
If  we  assign  the  next  data  feature  to  a  real,  but  incorrect,  model  feature,  then 
the  number  of  nodes  below  this  point  in  the  tree  that  will  on  average  be  explored, 
denoted  by  iy(s,u,£),  is  given  by 

f  ^  11**  1  k  /f\  / :  1  ..  a_  1  y  /  »  II.  I  \ 


v'  fk\  «  <-i«j  c +r,)-(*+r,J: 

.  Ar= 0  1=0  '  ' 
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a-t-l  / 
+  E 

t-u- 2 

E 

(^7  1)m<+1Pi<+1)~Ltf(,+1)J 

rr2H 

P2 

k=t—u  \ 

i=max{0,t—  a+f+k— u  —  1} 

+ 

i  .  i-i*j  r“+‘M“+rj0 

|m'pi  Jp^  v  j 

(i) 

t=max{0,*  —  u}  v  / 

) 

I 


Proof:  We  can  see  how  this  sum  arises  by  the  following  argument.  First,  the 
probability  that  this  assignment  satisfies  the  unary  constraints  is  given  by  pj  which 
multiplies  the  remaining  summations.  Since  we  already  have  a  u-interpretation  in 
hand,  and  since  we  are  assigning  the  next  data  feature  a  non-null  character,  we  must 
explore  the  next  t  —  u  —  1  levels  in  detail.  Hence  the  first  sum  in  the  expression 
counts  the  number  of  nodes  in  this  case.  The  summation  over  k  counts  the  number 
of  nodes  at  each  succeeding  level,  and  the  summation  over  i  counts  the  nodes  at  a 
particular  level,  by  considering  the  number  of  features  assigned  a  non  null  character. 
For  t  such  features,  there  are  (*)  different  ways  of  selecting  them,  and  for  each  one, 
there  are  m  possible  assignments.  To  determine  the  consistency,  we  multiply  by  the 
probability  of  applicable  unary  and  binary  constraints  holding  true.  Note  that  the 
exponent  for  the  unary  constraint  probability  counts  those  data-model  assignments 
that  are  not  correct.  The  exponent  for  the  binary  constraint  probability  counts  the 
total  number  of  possible  pairs  of  data-model  assignments,  minus  those  involving 
only  correct  assignments. 

Once  we  have  reached  the  level  in  the  tree  at  which  the  first  possible  t  interpre¬ 
tations  may  occur,  our  search  narrows.  In  particular,  we  need  not  consider  exploring 
portions  of  the  tree  for  which  interpretations  of  size  larger  than  t  are  involved,  and 
we  need  not  consider  exploring  portions  of  the  tree  for  which  interpretations  of  size 
t  are  impossible.  The  remaining  two  summations  count  these  cases,  the  first  one 
counting  those  cases  in  which  the  most  recently  assigned  data  feature  has  been  given 
a  non  null  character,  and  the  second  one  counting  those  cases  for  which  the  most 
recently  assigned  data  feature  has  been  matched  to  the  null  character.! 


Proposition  2:  The  expression  of  Proposition  1  can  be  bounded  by: 
W(a,u,f)  <  plP£  [(t  -  „)*<>(«)+ Vo(»)-i 

+  (s  -  l  -  t  +  u)(t  -  u  -  1)  [(s  -  i  -  2)/z]’°*u’,~^~1*  x 


(l  +  mPi 


Vi 


“)] 


and  by 

W(s,U,l)  >  PlP2U  [s  —  t  +  u  —  f+2] 


1  -  (t  -  u)mp\-sp'2  4 


(10) 


(16) 
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where 


M  =  mp\-sp{^ 


/(«)  = 


2u(l-6)  +  2  +  6-S2 


,  (t  -  u)p  -  1  | 

70  '  1  1  +  M  J 

.  (*  -  l)i/  -  1 . 

*° "  l  1  +  */  J 

a»(i-Q-M+«-Ma 

i/  =  mp{  6p2 ^ 


Proof:  To  use  equation  (1),  we  want  to  obtain  closed  form  bounds  on  the  sums. 
We  begin  with  an  upper  bound. 

Consider  the  first  sum  in  equation  (1).  First,  we  can  use 

Si  -  1  <  |£tj  <  Si 

to  remove  the  dependence  on  [.J.  Second,  since  pi  <  1,  we  can  get  an  upper  bound 
on  the  expression  by  replacing  the  resulting  exponent  for  P2  with  a  linear  expression 
in  i  that  is  smaller  than  the  current  exponent,  in  particular  by  replacing  terms  in 
i2  by  similar  terms  in  :.  This  leads  to  the  upper  bound  for  the  first  summation  of 

rt'irf  u  +  ri* 


where 


fasO 


p  =  mp\-6p{(xi) 


/(«)  = 


2u(l  -  S)  +  2  +  6  -  62 


We  can  simplify  this  by  using  the  geometric  progression,  to  yield: 

1 


Pi - - — 


(2) 


This  is  still  an  exponential,  albeit  a  small  one.  We  can  reduce  this  further,  by 
observing  that 

[1  +  H‘-U  =  E(‘7V  (3) 

>=o  '  •/  / 

and  asking  when  the  largest  term  occurs. 

In  general 


will  reach  a  maximum  for  the  smallest  j  such  that  the  jth  term  is  larger  than  the 
j  +  lat  term.  This  implies 

j  +  1  >  (m  —  ;> 
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or  equivalently  that  the  index  for  the  largest  term  is 

.  me  -  1 1  , 

J  =  i-rrrJ-  ,4) 

In  our  particular  case,  we  let 

_  I  (t  ~  U)H  -  1  , 

J°  1  +  p  ^ 

An  examination  of  p  under  the  limits  on  6  and  u  shows  that 

mpip^  <  p  <  mp-2. 

In  [Grimson  1989],  we  showed  that  if  the  data  features  are  randomly  distributed, 
then  the  probability  of  binary  consistency  is  given  by 

-(=)’ 

where  a  is  a  constant  that  depends  on  the  characteristics  of  the  object  model  and 
the  amount  of  sensor  noise.  Substituting,  we  see  that 

*-2 


Hence 

Using  equation  (3),  we  have 


0<P<  -• 
m 


Jo  <  |«2  ~  1J • 


[l  +  p]‘"“  <  1  +  (<-«) joU)^° 

<  1  +(t~  u)jo+V° 

and  substitution  into  equation  (2)  implies  that  an  upper  bound  on  the  first  summa¬ 
tion  in  equation  (1)  is  given  by 

-  tt)io(u)+Vio(u)"1-  (5) 

Now  consider  the  second  summation  in  equation  (1).  This  is  bounded  above  by 

'e  if  f*  T  1  Vvr  (6) 

k=t- u  «=0  '  *  ' 

We  can  use  the  same  method  as  above,  replacing  the  exponent  for  p2  with  a  smaller 
exponent  linear  in  i,  which  yields  an  upper  bound  on  expression  (6)  of 

«-<-! t-u-2 


2s  2s  l  i  jmpl  P2  (mPl  p2  ( ® ) ) 

t=t-u  »=0  '  ' 


where 


<K«)  = 


2tx(l  -  6)  +  4  +  6-36* 


If  we  let 

v  =  mp}~4pf 

then  a  similar  analysis  to  the  first  case  indicates  that  the  largest  term  occurs  for 
index 

(fc  -  l)t/  -  1 


*<>(«,*)  =  [■ 


LJ- 


1  +  v 
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This  allows  us  to  reduce  the  bound  for  equation  (6)  further,  to 


1_4  2u+l- 


V~*  1-6  2 

2^  mPi  Pi 


(f  —  u  —  1)  [(A?  -  l)i/]<o(u'fc) 


Now  the  maximum  value  for  u  is  the  same  as  the  maximum  for  (i ,  namely 


Hence,  kv  can  be  on  the  order  of  ^ k 2  which  in  general  will  be  larger  than  1.  This 
implies  that  the  largest  term  in  the  summation  in  equation  (7)  will  occur  for  to  as 
large  as  possible,  and  this  leads  to  the  following  bound  for  the  second  summation 
in  equation  (1): 


mp\  4P2U+1  i(s  _  l  -  t  +  u)(t  -  u  -  1)  [(s  -  t  -  1  (8) 

A  similar  analysis  of  the  third  summation  yields  a  bound  of 

(S  -  l  -  t  +  «)(<  -  u  -  IK  [{s-l-  2)/t]<l(“’*-/-1) ,  (9) 


*±±iL7%- -1) 


where 


By  piecing  together  equations  (5),  (8)  and  (9),  and  by  noting  that  v  <  P,  we  have 
as  an  upper  bound: 

W(s,u,f)  <  Plp£  [(*  -  u)-’0<t‘)+Vio(u)'1 

+  (3  -  l  -  t  +  tl)(f  -  tt  -  1)  [(«  -  t  -  2)/u],#(“’*_4-i)  X 

*  (l  +  ^p!_4P2U+1  +  ^  .  (10) 


Now  consider  a  lower  bound  on  equation  (1).  Consider  the  first  sum: 


t-u-l  * 


fc-0  t=0  '  ' 


To  reduce  this  expression,  we  need  to  replace  the  exponent  for  pi  with  a  larger 
expression  linear  in  i.  Replacing  |£*J  with  Si  —  1,  and  replacing  quadratic  terms  in 
i  with  linear  one6,  we  get  a  lower  bound  on  equation  (11)  of 


t — ti — l  * 


Jt=0  i=0  '  ' 


where 


M.,t)  = 

By  Vandermondes  Binomial  Theorem,  this  reduces  to 


PiP2U  1  +  mP(i'S)Pi  ■ 
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Since  we  are  seeking  a  lower  bound  on  this  expression,  we  note  that  the  term  in  the 
summation  is  always  greater  than  1,  and  we  obtain  as  a  lower  bound  for  equation 
(12): 

pip]u(t-u).  (13) 

Now,  consider  the  second  summation  in  equation  (1) 


•-*-1  t-u-2  /,  lN 

£  £  (  '  W 


+i)-^i+i)jn(‘+r2)-(“+l4(;+,,J: 
P  2 


We  can  bound  this  below  by  only  considering  terms  for  which  i  runs  from  0  to 
t-u-2: 


"xf " f'f ’ ' (* - . 

kst-u  t=0  '  *  ' 

As  in  the  previous  case,  we  can  obtain  a  new  lower  bound,  by  replacing  the  exponent 
for  p2  with  a  larger  expression  that  is  linear  in  i.  Using  the  same  method  as  above, 
this  leads  to  a  lower  bound  on  the  second  summation  of 


mp\  SpT  5  (a  -  2t  +  2u  - 1  +  2) .  (14) 

Similarly,  the  third  summation  can  be  bounded  below  by  the  same  methods  by 

PiP2U(*-2*  +  2u-£  +  2).  (15) 

By  piecing  together  equations  (13),  (14)  and  (15),  we  obtain 


2u-3(2o-3)-^  +  2 


W(s,uye)>p1p]u  1  [s  -  t  +  U  -  t  +  2]  1  -  (t  -  u)mp\~6p2  2 


Proposition  3:  Given  a  uniform  distribution  of  correct  data  features  among 
the  spurious,  and  given  the  previously  derived  expression  for  the  binary  probability 
of  consistency: 

*  =  (-)’■ 

'  m/ 

the  total  amount  of  search  expected  is  bounded  by 


w(s)  +  t^+v*-1  (l  +  (t  -  1)^) 


tPr’ 

1-pf- 
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where 


7  =  (3  -  3)p 
to  =  [(5  -  3 )p  -  lj 
jo  =  l*2  -  lj 

,  -  2-#3+i 

(3  =  mp\  sp2  3 

c  =  s-‘-5G  +  1) 


and  by 


W(s)  >  -  +  mpi 
0 


Li  ^2  \  ~  r  i  /  ,  

[' 1  -  p\  b~  ^2“  +  i  3 


/  1  -  P<?-S)t 

+a(aT^P^ 


P2(!  ,  ^P2* 

(I-P2)2  1  -P2 

ptS)(l-ptS)t) 


-b 


(1-P(23_5))2 


btp<?-6)t 


where 


1-i 


a  =  mp1  p2 


.(17) 


Proof:  The  previous  claim  gives  us  a  lower  bound  on  the  expected  search  of 
a  particular  subtree.  How  do  we  use  it  to  bound  the  search  of  the  whole  tree? 
Under  the  assumption  that  the  c  correct  data  features  are  uniformly  interspersed 
throughout  the  full  set  of  s  data  features,  we  can  see  that  at  the  top  level  of  the  tree, 
we  must  search  m  subtrees,  with  u  =  0,£  =  1,  that  is,  for  each  possible  assignment 
of  the  first  data  feature  to  a  real  model  feature,  we  must  explore  the  appropriate 
subtree.  Since  the  first  data  feature  is  not  part  of  the  true  object,  once  we  have 
exhausted  these  subtrees,  we  must  move  on  to  interpretations  that  exclude  the  first 
data  point,  by  considering  the  portion  of  the  tree  below  the  node  that  pairs  the  first 
data  point  to  the  null  character.  Under  this  node,  we  consider  pairings  of  the  second 
data  feature.  Again,  we  must  consider  m  subtrees,  with  u  =  0,t  =  2.  We  continue 
this  process  until  we  reach  level  t  —  In  this  case,  we  have  a  data  feature  that 
does  have  a  correct  match,  and  on  average  this  will  be  found  after  we  have  searched 
Y  subtrees  at  this  level.  We  then  repeat  this  process  below  this  node  in  the  tree, 
with  u  —  1,  and  so  on.  Hence,  the  expected  total  amount  of  search  is  given  by: 
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To  derive  a  lower  bound  on  this  expression,  we  can  substitute  from  equation  (16), 
and  execute  the  summation  over  i  to  obtain: 

/i\  t-i 


W(s)  >  t  Q')  +mpi  ^2  [a  -  bj\  1  +  a(<  -  j)p£  s)} 

.  '  '  j= o  «=i 

where 

a  =  mp\  >2  • 

To  further  simplify  this  expression,  we  can  use  the  identity  for  the  geometric 
progression: 

YV  =  qt~q 

hi  q~1 

and  a  derivative  of  this  to  yield: 


Y'  qt )  _  iqt 

hi  l-< 


to  get 


.... .  t  r  i-p^  btpV 


l  i O-pS3-*’)’  i-j4s-‘7  r  ’ 


We  can  also  derive  an  upper  bound  on  the  total  search  involved,  by  considering: 

t-i  f 

j=0 i= 1 

1  *-»  j 

=  tx+mEE  p + *)• 

j=0  i=l 

We  can  substitute  from  equation  (10)  and  reduce  the  summation  over  i  by  bounding 
terms  from  above  to  yield 
i  <-i 

II//  ^  j  i  mPl  \  '  T/J  -\io(i)+l  .  .  jnf  jl-l 


w(«)  <  <  j  +  ^p-  [(*  -  j)io(i)+Vio(i)_1 
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where 

,  .  a-«34« 

13  =  mp\~sp2  1  . 

To  reduce  this,  we  note  from  our  previous  analysis  that 

3o(j)  <  [*2  -  lj 

and  similarly 

*«(.M  -  \j  -  2)  <  [(a  -  3)p  -  lj. 


This  yields 


iv(«)  <  4 + ^ 


t- 1 


5>-j)io+v°-v2 


i=o 


t-i 


+ (i + *>?■'“)  (« - <#>  (<  -  j  - 1>7‘" 


where 


i=o 

7  =  (s  -  3)/z 


-(}-) 


i0  =  [(s  -  3)p  -  lj 
jo  =  [*2  -  lj- 

We  can  reduce  the  remaining  summations  by  expanding  out  the  first  term,  then 
bounding  each  remaining  term  in  the  summation  by  the  largest  term,  which  in  this 
case  is  the  second  term.  Using  the  results  from  above  on  the  geometric  progression 
and  its  derivatives,  this  leads  to 


w(s)  +  [ffc+V*-1  (l  +  (t  -  1)^) 

1  -  P2 


+  7 '°c(t  -  1) 


1  -P2 


+  /? 


1-PT‘\ 


+/? 


)1 

(i -pi-5)2 

1-P2_5. 

JJ 

(18) 


Proposition  4:  If  the  sensory  data  arising  from  a  correct  interpretation  are 
uniformly  distributed  among  the  spurious  data,  then  the  amount  of  search  expended 
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by  the  normal  constrained  search  method  is  bounded  by 

mS-2c<Wocc  <ml2c  +  -[l  +  e}a  [l +  -§-]'  *  +  ^  -  [1  +  p2]c . 
c  c  e  L  1  +  «2cl 

Proof: 

In  [Grimson  1989]  we  showed  that  the  number  of  nodes  at  the  kth  level  of  the 
tree  is  bounded  by 

2c{k)  <  nk  <  2c(-k)  +  1  +  mpipfj  [l+P2  +  mpipf 

+  mpi  1  -  p2(  ^  j  [1  +  [fc  +  ~  c(*0)] 

where  c(k)  is  the  number  of  data  features  actually  part  of  the  correct  interpretation. 
Using  our  earlier  assumption  that 

c(ife)  =  6k 

we  can  estimate  bounds  on  the  amount  of  search  in  the  normal  case  by  considering 

ili  i -I  k—6k  1 1 6fc 

m  2J  nk  <  m  2Sk  +  1  +  mpipj  1  +  P2  +  mpipj 

k= 1  fc=l 

+  mpi  1  -p^~  [1  +  P2]**-1  [k  +  P2(fc  -  6k)} . 


[1  +  P2]Sk~1  [fc  +  P2  (fc-tffc)]. 


Consider  the  first  term: 


Actually,  if  we  are  careful  in  our  considerations,  this  sum  is  really 


and  this  reduces  to 


Similarly,  the  second  term  is 


4V2*  <  m-2c 

6  t—*  ~  r 


m^tl  +  e]*  L,5fcJ  [1  +  P2  +  e]1 


where 


This  reduces  to 


e  =  mpipj  =  «pi . 


C-1  /i-1  \  -il 

SIS' . .  h*l 


and  by  using  the  same  trick  of  finding  the  maximal  term  in  a  sum,  this  reduces  to 


7|1  +  e1’  [1  +  T77 
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A  similar  argument  can  be  applied  to  the  remaining  term  in  the  summation,  yielding 

2“  l1  +P2lC- 

P2  C 

Combining  all  three  of  these  bounds  together,  we  have 

I  C— 1 


Wocc  <  m-2c  +  -  [1  +  (]' 

C  € 


i  + 


n 


l  +  e 


m3s  s  1C 

+  —Jl+»  i 
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